# PAPEA – A Modular Pipeline for the Automation of Protest Event Analysis

## Organization of the dataset

This dataset contains replicantion data for the article "PAPEA – A Modular Pipeline for the Automation of Protest Event Analysis".
The Python and R scripts are located in the folders 'python_scripts' and 'r_scripts'

- 'python_scripts/1_papea_pipeline_python.ipynb' is a Jupyter notebook for the first part of the PAPEA pipeline
- 'r_scripts/2_papea_pipeline_R.Rmd' is a R markdown script for the second part of the PAPEA pipeline

These files reresent the complete PAPEA pipeline. They do not replicate computations done in the paper, but provide the code to run the pipeline on your own data. One example dataset of articles from the German daily newspaper "taz - die tageszeitung" on which the pipeline can be tested is provided in the 'data' folder ('data/taz2015_sample.csv').

- 'r_scripts/3_papea_evaluate_predictions.Rmd' is a R markdown script replicating the evaulation of the performance of the language models in Table 1b and Table 2
- 'r_scripts/4_papea_appendix.Rmd' is a R markdown script to replicate all computations in the appendix.